1,799 research outputs found
Driving Markov chain Monte Carlo with a dependent random stream
Markov chain Monte Carlo is a widely-used technique for generating a
dependent sequence of samples from complex distributions. Conventionally, these
methods require a source of independent random variates. Most implementations
use pseudo-random numbers instead because generating true independent variates
with a physical system is not straightforward. In this paper we show how to
modify some commonly used Markov chains to use a dependent stream of random
numbers in place of independent uniform variates. The resulting Markov chains
have the correct invariant distribution without requiring detailed knowledge of
the stream's dependencies or even its marginal distribution. As a side-effect,
sometimes far fewer random numbers are required to obtain accurate results.Comment: 16 pages, 4 figure
A nonparametric HMM for genetic imputation and coalescent inference
Genetic sequence data are well described by hidden Markov models (HMMs) in
which latent states correspond to clusters of similar mutation patterns. Theory
from statistical genetics suggests that these HMMs are nonhomogeneous (their
transition probabilities vary along the chromosome) and have large support for
self transitions. We develop a new nonparametric model of genetic sequence
data, based on the hierarchical Dirichlet process, which supports these self
transitions and nonhomogeneity. Our model provides a parameterization of the
genetic process that is more parsimonious than other more general nonparametric
models which have previously been applied to population genetics. We provide
truncation-free MCMC inference for our model using a new auxiliary sampling
scheme for Bayesian nonparametric HMMs. In a series of experiments on male X
chromosome data from the Thousand Genomes Project and also on data simulated
from a population bottleneck we show the benefits of our model over the popular
finite model fastPHASE, which can itself be seen as a parametric truncation of
our model. We find that the number of HMM states found by our model is
correlated with the time to the most recent common ancestor in population
bottlenecks. This work demonstrates the flexibility of Bayesian nonparametrics
applied to large and complex genetic data
Random Tessellation Forests
Space partitioning methods such as random forests and the Mondrian process
are powerful machine learning methods for multi-dimensional and relational
data, and are based on recursively cutting a domain. The flexibility of these
methods is often limited by the requirement that the cuts be axis aligned. The
Ostomachion process and the self-consistent binary space partitioning-tree
process were recently introduced as generalizations of the Mondrian process for
space partitioning with non-axis aligned cuts in the two dimensional plane.
Motivated by the need for a multi-dimensional partitioning tree with non-axis
aligned cuts, we propose the Random Tessellation Process (RTP), a framework
that includes the Mondrian process and the binary space partitioning-tree
process as special cases. We derive a sequential Monte Carlo algorithm for
inference, and provide random forest methods. Our process is self-consistent
and can relax axis-aligned constraints, allowing complex inter-dimensional
dependence to be captured. We present a simulation study, and analyse gene
expression data of brain tissue, showing improved accuracies over other
methods.Comment: 11 pages, 4 figure
Modeling Population Structure Under Hierarchical Dirichlet Processes
We propose a Bayesian nonparametric model to infer population admixture, extending the hierarchical Dirichlet process to allow for correlation between loci due to linkage disequilibrium. Given multilocus genotype data from a sample of individuals, the proposed model allows inferring and classifying individuals as unadmixed or admixed, inferring the number of subpopulations ancestral to an admixed population and the population of origin of chromosomal regions. Our model does not assume any specific mutation process, and can be applied to most of the commonly used genetic markers. We present a Markov chain Monte Carlo (MCMC) algorithm to perform posterior inference from the model and we discuss some methods to summarize the MCMC output for the analysis of population admixture. Finally, we demonstrate the performance of the proposed model in a real application, using genetic data from the ectodysplasin-A receptor (EDAR) gene, which is considered to be ancestry-informative due to well-known variations in allele frequency as well as phenotypic effects across ancestry. The structure analysis of this dataset leads to the identification of a rare haplotype in Europeans. We also conduct a simulated experiment and show that our algorithm outperforms parametric methods
Genome-Wide Association with Uncertainty in the Genetic Similarity Matrix
Genome-wide association studies (GWASs) are often confounded by population stratification and structure. Linear mixed models (LMMs) are a powerful class of methods for uncovering genetic effects, while controlling for such confounding. LMMs include random effects for a genetic similarity matrix, and they assume that a true genetic similarity matrix is known. However, uncertainty about the phylogenetic structure of a study population may degrade the quality of LMM results. This may happen in bacterial studies in which the number of samples or loci is small, or in studies with low-quality genotyping. In this study, we develop methods for linear mixed models in which the genetic similarity matrix is unknown and is derived from Markov chain Monte Carlo estimates of the phylogeny. We apply our model to a GWAS of multidrug resistance in tuberculosis, and illustrate our methods on simulated data
Path Selection for Quantum Repeater Networks
Quantum networks will support long-distance quantum key distribution (QKD)
and distributed quantum computation, and are an active area of both
experimental and theoretical research. Here, we present an analysis of
topologically complex networks of quantum repeaters composed of heterogeneous
links. Quantum networks have fundamental behavioral differences from classical
networks; the delicacy of quantum states makes a practical path selection
algorithm imperative, but classical notions of resource utilization are not
directly applicable, rendering known path selection mechanisms inadequate. To
adapt Dijkstra's algorithm for quantum repeater networks that generate
entangled Bell pairs, we quantify the key differences and define a link cost
metric, seconds per Bell pair of a particular fidelity, where a single Bell
pair is the resource consumed to perform one quantum teleportation. Simulations
that include both the physical interactions and the extensive classical
messaging confirm that Dijkstra's algorithm works well in a quantum context.
Simulating about three hundred heterogeneous paths, comparing our path cost and
the total work along the path gives a coefficient of determination of 0.88 or
better.Comment: 12 pages, 8 figure
A saposin deficiency model in Drosophila: Lysosomal storage, progressive neurodegeneration and sensory physiological decline
Saposin deficiency is a childhood neurodegenerative lysosomal storage disorder (LSD) that can cause premature death within three months of life. Saposins are activator proteins that promote the function of lysosomal hydrolases that mediate the degradation of sphingolipids. There are four saposin proteins in humans, which are encoded by the prosaposin gene. Mutations causing an absence or impaired function of individual saposins or the whole prosaposin gene lead to distinct LSDs due to the storage of different classes of sphingolipids. The pathological events leading to neuronal dysfunction induced by lysosomal storage of sphingolipids are as yet poorly defined. We have generated and characterised a Drosophila model of saposin deficiency that shows striking similarities to the human diseases. Drosophila saposin-related (dSap-r) mutants show a reduced longevity, progressive neurodegeneration, lysosomal storage, dramatic swelling of neuronal soma, perturbations in sphingolipid catabolism, and sensory physiological deterioration. Our data suggests a genetic interaction with a calcium exchanger (Calx) pointing to a possible calcium homeostasis deficit in dSap-r mutants. Together these findings support the use of dSap-r mutants in advancing our understanding of the cellular pathology implicated in saposin deficiency and related LSDs
Geographically touring the eastern bloc: British geography, travel cultures and the Cold War
This paper considers the role of travel in the generation of geographical knowledge of the eastern bloc by British geographers. Based on oral history and surveys of published work, the paper examines the roles of three kinds of travel experience: individual private travels, tours via state tourist agencies, and tours by academic delegations. Examples are drawn from across the eastern bloc, including the USSR, Poland, Romania, East Germany and Albania. The relationship between travel and publication is addressed, notably within textbooks, and in the Geographical Magazine. The study argues for the extension of accounts of cultures of geographical travel, and seeks to supplement the existing historiography of Cold War geography
Leaf litter decomposition -- Estimates of global variability based on Yasso07 model
Litter decomposition is an important process in the global carbon cycle. It
accounts for most of the heterotrophic soil respiration and results in
formation of more stable soil organic carbon (SOC) which is the largest
terrestrial carbon stock. Litter decomposition may induce remarkable feedbacks
to climate change because it is a climate-dependent process. To investigate the
global patterns of litter decomposition, we developed a description of this
process and tested the validity of this description using a large set of foliar
litter mass loss measurements (nearly 10 000 data points derived from
approximately 70 000 litter bags). We applied the Markov chain Monte Carlo
method to estimate uncertainty in the parameter values and results of our model
called Yasso07. The model appeared globally applicable. It estimated the
effects of litter type (plant species) and climate on mass loss with little
systematic error over the first 10 decomposition years, using only initial
litter chemistry, air temperature and precipitation as input variables.
Illustrative of the global variability in litter mass loss rates, our example
calculations showed that a typical conifer litter had 68% of its initial mass
still remaining after two decomposition years in tundra while a deciduous
litter had only 15% remaining in the tropics. Uncertainty in these estimates, a
direct result of the uncertainty of the parameter values of the model, varied
according to the distribution of the litter bag data among climate conditions
and ranged from 2% in tundra to 4% in the tropics. This reliability was
adequate to use the model and distinguish the effects of even small differences
in litter quality or climate conditions on litter decomposition as
statistically significant.Comment: 19 Pages, to appear in Ecological Modellin
- …